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Abstract 

This paper investigates design techniques which may 
be applied to make program testing easier. We present 
methods for modifying a program to generate addi- 
tional data which we refer to as a certification trail. 
This additional data is designed to allow the program 
output to be checked more quickly and effectively. Cer- 
tification trails [14, 16] have heretofore been described 
primarily from a theoretical perspective. In this paper, 
we report on si comprehensive attempt to assess experi- 
mentally the performance and overall value of the certi- 
fication trail method. The method has been applied to 
nine fundamental, well-known algorithms for the fol- 
lowing problems: convex hull, sorting, huffman tree, 
shortest path, closest pair, line segment intersection, 
longest increasing subsequence, skyline, and voronoi di- 
agram. Run-time performance data for each of these 
problems is given, and selected problems are described 
in more detail. Our results indicate that there are many 
cases in which certification trails allow for significantly 
faster overall program execution time than a 2-version 
programming approach, and also give further evidence 
of the breadth of applicability of this method. 

HReywords: Software design for testability, software 
fault detection, certification trails, error monitoring, 
design diversity, data structures. 

1 Introduction 

We have examined a wide variety of fundamental 
algorithms to determine how they can be redesigned 
to allow for easier testability. To make the problem 
of testing the correctness of the output of a program 
more tractable we have found it is desirable to modify 
the program so that it generates additional data which 
we refer to as a certification iratl. This additional data 
is designed to allow the program output to be checked 
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more quickly and effectively. Our previous work on cer- 
tification trails emphasized a theoretical perspective in 
which we proved that the asymptotic time complexity 
of the testing process could be reduced [14, 16]. In 
this paper, we report on implementations of the cer- 
tification trail method so as to assess experimentally 
with run-time data the performance and overall value 
of the technique. We have implemented the certifica- 
tion trail method for nine fundamental and well-known 
algorithms of broad importance and applicability. For 
each algorithm, we have produced three implementa- 
tions: a version which produces the output; a version 
which produces the output and generates a certifica- 
tion trail; and a version which checks the output while 
utilizing the certification trail. Specifically, algorithms 
for the following problems are analyzed: huffman tree, 
shortest path, sorting, closest pair, line segment in- 
tersection, convex hull, longest increasing subsequence, 
skyline, and voronoi diagram. The scope of the algo- 
rithms considered gives credibility to the overall appli- 
cability of the certification trail method. Furthermore, 
comparisons of run-time data for each of the three ver- 
sions of each of the algorithms considered reveal many 
cases in which an approach using certification trails al- 
lows for significantly faster overall program execution 
time than a 2- version programming approach. 


2 Introduction to Certification Trails 

First, let us consider a basic method which is used 
to perform testing to detect software faults called In- 
version programming [1, 2]. This method utilizes N 
teams of programmers, each independently implement- 
ing separate programs based on a problem specifica- 
tion. The programs are executed on the same input and 
the outputs are compared. Errors caused by software 
faults are detected whenever the independently writ- 
ten programs do not generate coincident errors. Thus 
the technique exploits design diversity. Also, note that 
the method can detect hardware faults which affect the 
separate executions in distinct ways causing distinct 
outputs. It is particularly valuable for detecting errors 
caused by transient fault phenomena. The N-version 
programming method can be used to detect faults af- 
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Figure I : Timeline Comparison of the Certification 
J TV*il with 2- Version Programming 


is, ter a system has been put into production or it can b< 
jf U*ed to detect faults in a testing phase prior to produc- 
tjon. If two teams are used then we refer to the method 
u 2-version programming. 

§pi The certification-trail technique is designed to pro- 
B- tride similar capabilities for detecting software and 
. hardware faults as 2-version programming but expend 
fewer resources. As mentioned above the central idea 

- Je to modify the first algorithm so that, with modest 

— additional overhead, it leaves behind a trail of data 
^which we call a certification trail. This data is chosen 
p;«o that it can allow the second algorithm to execute 
—more quickly and/or have a simpler structure than the 

first algorithm. As above, the outputs of the two exe- 
cutions are compared and are considered correct only 
^if they agree. An illustration of typical execution times 
of 2-version programming versus the certification trail 
^method is given in Figure 1. We assume that the two 
implementations developed for 2-version programming 
—nave approximately equal execution times. Note how- 
ever that we must be careful in defining this method 
! i Tf* lts error detection capability might be reduced 
g.«y the introduction of data dependency between the 
two program executions. For example, suppose the first 
program execution contains an error which causes an in- 
^irrect output and an incorrect trail of data to be gen- 
ate . Further suppose that no error occurs during the 
execution of the second program. It still appears pos- 
able that the execution of the second program might 

— * k Incorrect trai1 to generate an incorrect output 
Which matches the incorrect output given by the execu- 
_on of the first program. Intuitively, the second execu- 
would be “fooled” by the data left behind by the 
execution. The definitions we give below exclude 
.Jtt possibility. They demand that the second execu- 

m a e , er generate a correct answer or signal that an 
wor has been detected. 


3 Formal Definition of a Certification 
Trail 

In this section we will give a formal definition of a 
certification trail and discuss some aspects of its real- 
izations and uses. 

Definition 3.1 A problem P is formalized as a rela- 
tion, i.e., a set of ordered pairs. Let D be the domain 
(that is, the set of inputs) of the relation P and let S 
be the range (that is, the set of possible solutions). We 
say an algorithm A solves a problem P iff for all d £ D 
when d is input to A then an s € S is output such that 
(d,s)€ P. 

Definition 3.2 Let P : D — ► S be a problem. A solu- 
tion to this problem using a certification trail consists of 
two functions F\ and F 2 with the following domains and 
ranges F, :D-SxTandF 2 :DxT-+SU {error} 

T is the set of certification trails. The functions must 
satisfy the following two properties: 

(1) for all d € D there exists s 6 S and t £ T such that 
Fi(d) = ( s,t ) and F 2 (d,t) = s and ( d,s ) £ P 

(2) for all d £ D and all t £ T either 

(F 2 (d, t) = s and (d, s ) 6 P) or F 2 (d, <) = error. 


We also require that F, and F 2 be implemented so 
that they map elements which are not in their respec- 
tive domains to the error symbol. Intuitively, the first 
condition states that if both parts of our solution exe- 
cute correctly, then their answers agree and are correct 
The second condition states that a correct secondary 
execution will never produce an incorrect output, i.e., 
one that is not a solution to the problem. 

The definitions above assure that the testing capabil- 
ity of the certification-trail approach is similar to that 
obtained with a 2-version programming approach dis- 
cussed earlier. That is, if a software or hardware fault 
occurs during only one of the executions then either the 
fault will be detected or the output will be a correct so- 
lution to the problem. The examples in this paper will 
indicate that this new approach can save overall execu- 
tion time. 


4 Certification Trail Examples 

In the remainder of this paper we evaluate the use 
of certification trails for nine classic problems in com- 
puter science. We have implemented algorithms for 
these problems together with other algorithms which 
generate and use certification trails. In addition, we 
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discuss a genera] technique for construction of certifi- 
cation trails for algorithms using a wide range of data 
structures. This technique is used to implement the 
certification trails for several of our examples. 

We provide a full description of the algorithm for the 
convex hull problem which generates a certification trail 
and a full description of the algorithm which uses that 
trail. Because of space considerations the discussion 
of the other algorithms is abbreviated. In some cases 
references to previous publications or technical reports 
which describe the algorithms more fully are given. 

The algorithms we have chosen to implement are 
not always the algorithms which have the smallest 
asymptotic time complexity. Often the asymptoti- 
cally fastest algorithms have large constants of pro- 
portionality which make them slower on the data sizes 
we examined. We modified and used some programs 
from major software distributions such as quicker-sort 
from a Berkeley Unix distribution. Fortune’s algo- 
rithm for computing the Voronoi diagram was obtained 
from an Internet site at AT&T Bell Labs. Other algo- 
rithms were based on textbook discussions. It should 
be stressed here that this research is continuing as 
we further increase our corpus of algorithm and data- 
structure implementations. 

4.1 Explanation of timing data 

We have collected timing data for the algorithms on 
a Sun SPARCstation ELC with 16MB of RAM. The 
system was run as a standalone machine in single user 
mode during the timing experiments. Timing data was 
obtained through the getrusageQ system call. The user 
times are reported in the data. 

Much of the data presented in the timing table is 
essentially self-explanatory relative to the certification 
trail technique and algorithms considered. However, a 
brief discussion of the table entries is appropriate. 

The column labelled Basic contains timing data 
which gives the execution time of the algorithm in pro- 
ducing the output without the generation of the certi- 
fication trail. All timing data is listed in seconds. 

The Primary Execution (Prim. Exec.) column gives 
the execution time of the algorithm in producing the 
output with the additional overhead of generating the 
certification trail. 

The Secondary Execution (Sec. Exec.) column gives 
the execution time of the algorithm in producing the 
output while using the certification trail. 

The Percent Savings (% Sav.) column records 
the percentage of the execution time savings which is 
gained by using the certification trail method as com- 
pared to 2- version programming approach. This as- 


sumes that both versions take approximately the same 
amount of time to execute. 

The Speedup column is the ratio of the run times of 
the Basic Algorithm and the Secondary Execution. 

For the Huffman tree data, the input size for the 
Huffman tree program is the number of nodes. Each 
node is given a frequency, chosen uniformly from the 
integers {1, 2, n}. n was also selected to be the 

number of nodes. 

For the shortest path table, there are two numbers 
associated with the input size, the first is the number of 
vertices in the graph, the second the number of edges. 
A graph with the required edges is selected uniformly 
from the set of all such graphs, then tested for connect- 
edness in order to assure that paths exist to all vertices. 

For the geometric algorithms, the input size is the 
number of points (or lines) in the original data set. 
Point set input was generated by choosing points with 
integer coordinates uniformly over a large square (typ- 
ically 1,000,000 by 1,000,000 or larger square). For the 
Line Segment Intersection problem, lines were gener- 
ated by picking a line segment start point uniformly 
from a large square and picking offsets for x and y- 
coordinates from a smaller range to give the end point 
of the line segment. This was done to bound the line 
length and avoid data sets resulting in a quadratic num- 
ber of intersections. 

Data for the longest increasing subsequence problem 
was produced by generating a random permutation of 
[1 ..jV] for input size N . 

Sorting was performed on an array of pointers to 
structures. It was assumed that each structure con- 
tains an extra integer field for use in generating the 
certification trail. Sorting was performed on integer 
keys, though the technique can be used with a more 
complex key (in fact, using complex keys is very likely 
to increase the speedup achieved). Integers were chosen 
uniformly from interval [1.. I, 000, 000, 000]. 

4.2 Convex Hull Example 

The convex hull problem is fundamental in the field 
of computational geometry. Our certification trail so- 
lution is based on a convex hull algorithm due to Gra- 
ham [6] called Graham’s Scan. For basic definitions in 
computational geometry see the text of Preparata and 
Shamos[Il]. For simplicity in the discussion which fol- 
lows we will assume the points are in general position, 
e.g., no three points are collinear. It is not hard to 
remove this restriction. 

Definition 4.1 The convex hull of a set of points, T , 
in the Euclidean plane is defined as the smallest convex 
polygon enclosing all the points. This polygon is unique 
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md its vertices are a subset of the points in T. It is 
gpecified by a counterclockwise sequence of its vertices. 

The algorithm given below constructs the convex 
fcull incrementally in a counterclockwise fashion. The 
list step of the algorithm selects an “extreme” point 
md calls it p\. The next two steps sort the remaining 
points. The order of the points is determined by the 
dopes of the line segments formed by joining each point 
topi- It is not hard to show that after these three steps 
the points when taken in order, pi , P2» • • ■ >Pn, f° rrn a 
Ample polygon; although this polygon may not be con- 
vex. The Graham Scan algorithm traverses this poly- 
gon, removing points until the resulting polygon is con- 
vex. The main FOR loop iteration adds vertices to the 
polygon under construction and the inner WHILE loop 
lemoves vertices from the construction. A point is re- 
moved when the angle test performed at line 6 reveals 
tii&t the angle at that vertex is obtuse. It is easy to 
demonstrate that when a point is removed, it must fall 
within the triangle defined by three other points, p\ and 
the two points that were adjacent to the point removed. 
When the main FOR loop is complete the convex hull 
. Has been constructed. The execution of this algorithm 
it demonstrated in Figure 2 . For each removed point, 
the associated triangle is indicated in bold lines, and in 
the text below the diagram. Our certification trail relies 
. eh the fact that that these triangles can be determined 
quickly. 

Algorithm CONVEXHULL(T) 

Input: Set of points, T, in R? 

Output: Counterclockwise sequence of points in 
R? which define the convex hull of T 
F Let p\ be the point with the largest 

x coordinate (and smallest y to break ties) 

For each point p (except pi) calculate 
. 1 the slope of the line through pi and p 
Sort the points (except pi) from smallest 
slope to largest. Call them p?, • . • > Pn 

$i pi ; 02 : = P2; 03 : = P3; m = 3 
FOR k = 4 to n DO 
WHILE the angle formed by 
' 0m— 1 > 0 m > Pk is > 180 degrees 

DO m := m - 1 END 
J m := m + 1 
J 7m := Pk 

• END FOR 

!0 FOR * = 1 to m DO, OUTPUT^) 

END FOR 

®nd convexhull 




First execution: In this execution the code CON- 
VEXHULL is used. The certification trial is generated 



Figure 2 : Convex hull example. 
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by adding an output statement within the WHILE loop. 
Specifically, if an angle of less than 180 degrees is found 
in the WHILE loop test then the four tuple consisting 
of 0m j 0m— 1 > Pi 1 Pk is output to the certification trail. 
The final convex hull points q \ , . . - , 0 m are a ls° output 
to the certification trail. Strictly speaking the trail out- 
put does not consist of the actual points in R ? . Instead, 
it consists of indices to the original input data. This 
means if the original data consists of pi , P2, * •, Pn then 
rather than output the element in R 2 corresponding to 
Pi the number i is output. 

Second execution: Let the certification trail con- 
sist of aset of four tuples, (x^aj^^ci), (*2,02,63^2), 

. . . , (x rj a r , b ry c r ) followed by the supposed convex hull, 
01, 02, * - , 0m- The code for CONVEXHULL is not used 
in this execution. Indeed, the algorithm is dramatically 
different than CONVEXHULL. 

It consists of five checks on the trail data. 

• First, it checks that there is a one to one correspon- 
dence between the input points and the points in 

{xj,. . - ,£r} U {01,* • • , 0m }• 

• Second, it checks that for each i G { 1 , - - - , r}, a<, 
fcj, and Ci are among the input points. 

• Third, the algorithm checks that for each i G 

*» lies within the triangle defined by 
Gi^biy and Cj . 
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• Fourth, the algorithm checks that for each triple 
of counterclockwise consecutive points on the sup- 
posed convex hull, the angle formed by the points 
is less than or equal to 180 degrees. 

• Fifth, it checks that there is a unique point among 
the points on the supposed convex hull which is a 
local maxima. We say a point g on the hull is a local 
mazima if its predecessor in the counterclockwise 
ordering has a strictly smaller y coordinate and its 
successor in the ordering has a smaller or equal y 
coordinate. 

If any of these checks fail then execution halts and 
“error” is output. Otherwise the convex hull read from 
the trail is output. As mentioned above, the trail data 
actually consists of indices into the input data. This 
does not unduly complicate the checks above; instead 
it makes them easier. The correctness and adequacy of 
these checks must be proven. A complete formal proof 
is beyond the scope of this paper, instead a brief outline 
of the proof will be given. 

Using our formal definition of certification trails, let 
D be the set of all finite planar point sets T. Let S 
be the set of convex polygons, with vertices in coun- 
terclockwise order (the restriction to counterclockwise 
ordering makes the convex hull unique). Then the 
problem we are considering is HULL : D — * S where 
HULL(T) is the polygon in S that forms the convex 
hull of T. 

The description of the algorithms above defines func- 
tions F\ and F 2 . We must show that both conditions of 
Definition 3.2 hold. The following two lemmas, which 
we state without proof, are required. 

Lemma 4.2 Lei P be a polygon on n points 
Pi.Pl. P Is a convex polygon iff P is simple 
and each angle PiPjPk is less than or equal to 180 de- 
grees, where i is in 1,2, ...n, j — (* + 1) mod n, and 
k = (» + 2) mod n. 

Lemma 4.3 If P is a non-simple polygon, then either 
P has more than one local maxima, or the interior angle 
at some vertex is greater than 180 degrees. 

These are deceptively simple statements. Though 
they are intuitively obvious, a formal proof is difficult. 
It is interesting to note that some computer graphics 
texts give an incorrect test for determing convexity of 
a polygon by omitting the check for simplicity required 
by Lemma 4.2. 

Recall that the first condition is: 

For all d € D there exists s G S and t 6 T such that 
F,(d) = (s, t) and F 2 (d,t) = s and (</,«)€ P- 
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Intuitively, this means that if both executions per- 
form correctly then they will both output the convex 
hull of the input, which is unique. Note that genera- 
tion of the certification trail does not affect the output 
of the Graham Scan algorithm. Thus the condition 
on F\(d) is satisfied by the correctness of the Graham 
Scan algorithm, the proof of which is well known (1 1). 
To show that F 2 (d, t) = s, note that a copy of s is con- 
tained on the trail t. Our description of F 2 (<i,t) states 
that s is output unless one of the five checks above 
fails. It is trivial to verify that the first three of these 
checks must be satisfied. The fourth check cannot fail, 
since the polygon described by s is convex (because 
(d, s) € P). Similarly, if the fifth check fails, then the 
polygon described by s has two local maxima, and this 
is not possible for a convex polygon. 

The second condition is: 

For all d € D all t € T either (F 2 (d,f) = s and 
(d,s) 6 P) or F 2 (<U) = error. 

Intuitively, this means that given an input and arbi- 
trary trail, F 2 (d, t) produces a solution to the problem 
or flags an error. 

Our definition of F 2 (d, <) states that the polygon Q 
stored on the trail is output unless one of the five checks 
fails. We must therefore demonstrate that if all five 
checks succeed, then Q is the convex hull of the input 
points d. Let H be the convex hull of the points d, 
The first condition guarantees that every point in d 
is classified as a hull point or an interior point. The 
second condition guarantees that the triangles used to 
identify interior points are formed from input points, 
and the third check verifies that the interior points are 
indeed inside their respective triangles. Note that we 
do not attempt to verify that the triangles used are the 
ones that would be produced by F\(d). In general, for 
a given interior point, there may be several triangles of 
input points in which it is contained. Together, the first 
three conditions imply that all points in H are also in Q , 
since it is impossible for a hull point to be contained in 
a triangle. Note that these three checks do not exclude 
the possibility that interior points are present in Q, nor 
do they guarantee that the ordering of the hull points in 
Q is correct. The final two checks will accomplish this. 
If the last two checks are satisfied, Lemma 4.3 states 
that Q is simple, and therefore it must be convex by 
Lemma 4.2. 

Thus, Q is a convex polygon whose vertex set is a 
superset of the vertices of H, i.e., H is contained in 
T. This implies that no other point from the input 
set may be a vertex of Q, since any input point that 
is not a hull point is interior to H and therefore inte- 
rior to Q. Finally, it is clear that the ordering of the 
vertices of Q and H must be the same (although there 


I 


^jnight appear to be two possible orderings, clockwise 
and counterclockwise, a clockwise ordering will fail the 
fourth check). Therefore if all five checks succeed, then 
_^the output of /^(d, J) will be the convex hull of d. 

This demonstrates that the algorithms described 

- m^t the conditions of Definition 3.2, and are therefore 
l certification trail solution to the convex hull problem. 
** Time complexity: In the first execution the sort- 
ing of the input points takes 0(n log(n)) time where n is 

I ;be number of input points. One can show that this cost 
^-dominates and the overall complexity is 0(nlog(n)). 

It is possible to implement the second execution so 
: hat all five checks are done in 0(n) time. The first two 
checks may be done in linear time since the certification 
trail contains indices into the input data. The third 
,nd fourth checks require a constant time calculation at 
ach point. Finally, the uniqueness of the local maxima 
Ts clearly checkable in linear time. 

Order-of-Magnitude Testing Speedup: It 

hould be noted that for the convex hull problem, we 
-wre seeing an order of magnitude speedup for reason- 
able sized problems. We believe this offers a dramatic 

- emonstration of the efficiency of our proposed software 
__sting technique using certification trails in compari- 
son with the 2-version programming technique. 


Size 

[ i 

Basic 

Prim. Exec. 
(Also Gen. 
Trail) 

Sec. 

Exec. 

% 

Sav. 

Speedup 


0.64 

0.67 

0.08 

41.41 

8.00 

- 10000 

1.38 

1.40 

0.17 

43.12 

8.12 

*—25000 

3.89 

3.84 

0.46 

44.73 

8.46 

50000 

8.44 

8.50 

0.85 

44.61 

9.93 

*00000 

17.36 

17.68 

1.65 

44.33 

10.52 


Table 1: Convex Hull 


Sorting Example 

This important problem has a massive literature. In 
fens section we will discuss how to apply the certifi- 
cation trail approach to the sorting problem. Let us 
t ume that the sorting algorithm takes as input an ar- 
r — °f n elements and outputs an array of n elements. 
The algorithm is supposed to place the data in non- 
d^reasing order. 

L To design a certification trail algorithm we must dis- 
cA?er the nature of the data that should be included 
in the certification trail to allow quick computation 
of he final output sorted array. Suppose that we de- 
c ^' to use the output array itself as the certification 
^rail. We note that it is easy to check that this array is 
n on-decreasing order by simply performing a single 


pass over the array. Unfortunately, it is considerably 
more difficult to make sure that this array contains ex- 
actly the same elements as the original input array. In- 
deed, this problem has a lower bound time complexity 
of fi(ulog(n)) in a comparison based model. 

Because of this difficulty we use the permutation of 
the elements defined by the input and output data ar- 
rays as the certification trail. This permutation is com- 
puted by attaching an Item Number field to the data 
elements before sorting. The t-th item receives item 
number i. After the elements are sorted, the permu- 
tation from input to output is obtained by reading the 
Item Numbers from the elements in their new order. 

The second execution reads the permutation from 
the trail and verifies that it is a permutation on n el- 
ements, i.e., that no numbers are repeated or omitted. 
This permutation is used to rearrange the input ele- 
ments in linear time. Finally the algorithm checks that 
these elements are now in non-decreasing order. 


Size 

Basic 

Prim. Exec. 
(Also Gen. 
Trail) 

Sec. 

Exec. 

% 

Sav. 

Speedup 

10000 

0.28 

0.30 

0.04 

39.29 

7.00 

50000 

1.80 

1.90 

0.19 

41.94 

9.47 

100000 

3.96 

4.08 

0.41 

43.31 

9.66 

500000 

23.95 

24.69 

2.14 

43.99 

11.19 

1000000 

50.23 

51.57 

4.38 

44.31 

11.47 


Table 2: Sort 


4.4 Certification Trails For Abstract Data 
Types 

Before we present the rest of our example algorithms 
we discuss a general technique applicable to many al- 
gorithms and data structures. 

An abstract data type is a data object or set of data 
objects together with a group of operations for manip- 
ulating the object(s). Each operation takes a (possibly 
empty) set of arguments, and some, but not necessarily 
all, operations return answers. Many algorithms make 
extensive use of abstract data types. 

We describe a method for automatically generating 
a certification trail for an algorithm which uses an ab- 
stract data type. This is done by modifying the ab- 
stract data type operations, so that during the first 
execution they generate a certification trail, and dur- 
ing the second execution they use the certification trail. 
Otherwise, these operations are identical to the original 
abstract data type operations, i.e., they take the same 
type of arguments and have the same return types. The 
object of creating and using the certification trail is to 
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allow a more efficient implementation of the abstract 
data type during the second execution. 

We illustrate this technique for the following ab- 
stract data type which we call Ordered Collection. An 
Ordered Collection will contain a set of pairs (*\ x) 
where i is an item number, and x is a real number value. 
(This selection is made for simplicity of description, the 
elements being stored could be more complex). No two 
elements of the set may have the same item number, 
though several items may have a common value. We 
define a total ordering on pairs by (»,x) < (*',*') iff 
x < x' or x — x' and i < i f 

The following operations are defined on an Ordered 

Collection: 

INSERT(t» Add the element («’,*) to the set. 

DELETE(i) Delete the element with item number i 
from the set. 

PREDECESSOR(t) Let (t\z) be the element in the 
set with item number i. This operation returns 
its predecessor, that is, the largest pair less than 
(i,x). A special value SMALLEST is returned if 
(i,x) is the smallest element in the set. 

MIN Return the smallest element in set. 

NEAREST(x) Return the element from the set with 
value closest to x. If there is a tie, return the 
element with the smallest item number. 

This small set of operations is being chosen for con- 
creteness, several additional operations could be easily 
defined. If an error occurs during any of these opera- 
tions, for example, inserting pairs with duplicate item 
numbers or attempting to delete a non-existent item, 
then the program terminates indicating an error. 

These operations may be modified to produce a cer- 
tification trail during the first execution by modifying 
the INSERT(t,x) and NEAREST(x) operations to do 
the following (in addition to their normal function): 

INSERT(t>) After adding this element to the set, 
perform a PREDECESSOR(i) operation and write 
the item number of the answer to the certification 
trail. 

NEAREST(x) Write the item number of the answer 
to the certification trail. 

A typical implementation of an abstract data 
type supporting the above operations would require 
fi(nlog(«)) time to process a sequence of n operations. 
By using the certification trail, we can achieve linear 
time for n operations during the second execution. This 


includes the time necessary to check the trail for cor- 
rectness as well as use it. 

The implementation of the Ordered Collection for 
the second execution will be a structure called an in- 
dexed linked list. This is a doubly linked list, along 
with an array Items of pointers, indexed by item num- 
ber. The i-th element in this array points to the list 
node for the element with item number i (or is NULL if 
no element in the list has item number i). This allows 
us to find an element in constant time given its item 
number. The elements themselves are maintained in 
ascending order (according to the pair ordering given 
above) on a doubly linked list, i.e., each element has 
pointers to its successor and predecessor. In addition 
to the array, we maintain a variable Starts which stores 
the item number of the first element in the list. 

The abstract data type operations for the second 
execution are defined as follows: 

INSERT(t,x) Read the item number p from the trail. 
p is the item number that would be the predecessor 
of (i, x) if it were in the set. Ilems\p) points to 
the list node for the element with index p, call 
this element (p, x p ). We can insert (t,x) after this 
node using ordinary list operations. Before doing 
so, however, we make three checks: 

i. Check that 7fem$[i] is currently NULL, i.e., 
there is not currently an element with item 
number t in the set. 

ii. Check that (t,x) is greater than (p, x p ). 

iii. Check that (i,x) is less than the successor of 
(Pi*p) 

If these checks are satisfied, then (i, x) may be in- 
serted after (p,x p ). Set Ilem$[t] pointing to the 
list node for (i,x). 

Note that special cases occur at the beginning and 
end of the list. We omit the specifics of these cases, 
mentioning only that Start must be updated for 
insertions at the front of the list. 

DELETE(i) Check that Items[i] is not NULL, i.e., 
there is an element with item number i currently 
in the set. If so, remove it from the linked list, 
and set 7*ems[i] to NULL. If we remove the first 
element of the list we must also update Start. 

PREDECESSOR(i) 7*ems[i] points to the element 
with item number t, and its predecessor may be 
found by following the appropriate pointer. 

MIN The variable Start indicates the item number of 
the first element on the list, i.e., the minimum el- 
ement. Iiems[Stari] therefore points to this ele- 
ment. 
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pfEAR'EST(x) Read the index i from the trail, 
/fcrnsfi] points to the element having this item 
number, call it (t, u). To verify that this is the cor- 
rect answer we will have to check one of its neigh- 
bors. If v < x, then only the successor of (i,x) 
could have a value closer to v. Otherwise, only the 
" predecessor is a candidate. Check the appropriate 
neighbor. 


I i 


— Although our example uses elements that contain 
item numbers, it is not necessary that the abstract data 
* ype be defined in this way. The insert operation of an 
A bstract data type may be modified to tag elements 
with item numbers as they are inserted. 

Variations on this scheme are possible. For exam- 
>le, by modifying DELETE(t) and NEAREST(x) op- 
erations so that they also write the item numbers of 
predecessors to the trail, it is possible to use a singly 
~ inked list during the second execution. More sophis- 
ticated schemes, involving marking list nodes for dele- 
tion and delayed checks, allow the use of singly linked 
^ists without requiring DELETE(i) and NEAREST(x) 
ir,o produce predecessor information. 

The technique in this example generalizes to other 
^.abstract data types supporting a predecessor operation. 
1- n fact, a somewhat weaker condition often suffices; it 
"Ts sufficient that the specific implementation of the ab- 
stract data type allow the predecessor of an element 
~jO be found at the time the element is inserted. The 
Abstract data type itself need not support a predeces- 
sor operation. This technique is used in four of our 
=i: example algorithms. 

Using this technique, it is possible to reuse the first 
execution code, except for the code implementing the 
_ abstract data type operations. One advantage of this 
|l.s that it may be possible to add extra checking to such 
•code, such as bounds checking and checks on pointer 
references, that may be too expensive to include in the 
Sjrst execution. Of course, the two programs may be 
developed separately as long as the specifications agree 
on the use of the abstract data type. 

Space does not permit a full proof of correctness of 
his scheme. A proof proceeds by establishing the fol- 
lowing invariants on the indexed linked list used in the 
second execution. 


i. The pairs in the linked list are in order from small- 
est to largest. 


(Note that this implies that each list node is 
pointed to at most once). 

iv. Every node in the list is pointed to by some item 
in Items[\]. 

v. Start is the item of the first element in the list. 

These conditions are clearly satisfied by an indexed 
linked list containing no elements (i.e., before any oper- 
ations have been performed). Inspection of operations 
that query the list (MIN and NEAREST for example) 
shows that they function correctly if the above condi- 
tions are met. It is easy to prove correctness of the 
certification trail by demonstrating that the operations 
maintain a one to one corresponce between the pairs 
in the linked list and the elements in the abstract data 
type and that the above invariants are preserved. 

4-5 Shortest Path Example 

This is another classic problem which has been ex- 
amined extensively in the literature. Our approach is 
applied to a variant of the Dijkstra algorithm [3] as 
explicated in [17]. We are concerned with the single 
source problem, i.e., given a graph and a vertex s, find 
the shortest path from s to v for every vertex v. 

The algorithm for this problem which has the fastest 
asymptotic time complexity uses fusion trees and is 
given in [5]. This algorithm however appears to have 
a large constant of proportionality and therefore we do 
not use it. 

We use the techniques just discussed to implement 
the certification trail for this problem. A full descrip- 
tion may be found in a technical report [15]. 


Size 

M 

■ 

HUH 

Sec. 

Exec. 

mm 

Speedup 

100,1000 

0 04 

0 05 

0 02 


2 00 


0.15 

0.16 

0.06 


2 50 


mm i 

033 


29.03 

2 82 


0.70 

■IHi 

KYll 

29 29 



1.58 



32 91 


2500,25000 

2 06 

2 15 

0.55 

34.47 



Table 3: Shortest Path 


4.6 Huffman Tree Example 


p ii. Each element of the Items array is either NULL or 
" points to one of the nodes in the linked list. 

|^iii. If Ilems[t\ is not NULL, then the list node pointed 
gj to by it stores an element with item number i. 


This is another classic algorithmic problem and one 
of the original solutions was found by Huffman[7]. It 
has been used extensively to perform data compression 
through the design and use of so called Huffman codes. 
These codes are prefix codes which are based on the 


m 
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Huffman tree and which yield excellent data compres- 
sion ratios. The tree structure and the code design are 
based on the frequencies of individual characters in the 
data to be compressed. Here we are concerned exclu- 
sively with the Huffman tree. See [7] for information 
about the coding application. 

Definition 4.4 The Huffman tree problem is the fol- 
lowing: Given a sequence of frequencies (positive inte- 
gers) /[l], /[2], . . .,/(»»], construct a tree with n leaves 
and with one frequency value assigned to each leaf so 
that the weighted path length is minimized. Specif- 
ically, the tree should minimize the following sum: 

El e leaf len (*)/[*l where LEAF is the set of leaveS ’ 

len(«) is the length of the path from the root of the tree 
to the leaf /<, /[«] is the frequency assigned to the leaf 

A full description of the method we employ to gener- 
ate and use a certification trail is detailed in a technical 
report [15]. 


Size 

Basic 

Prim. Exec. 
(Also Gen. 
Trail) 

Sec. 

Exec. 

% 

Sav. 

Speedup 

5000 

0.81 

0.87 

0.16 

36.42 

5.06 

10000 

1.76 

1.86 

0.33 

37.78 

5.33 

25000 

6.01 

6.30 

1.02 

39.10 

5.89 

50000 

10.62 

11.14 

1.70 

39.55 

6.25 


Table 4: Huffman tree 


4.7 Other problems 

We report timing data for five other problems, the 
“Manhattan skyline” problem, computation of Voronoi 
diagrams, longest increasing subsequence, the closest 
pair problem, and line segment intersection. Space per- 
mits only a brief description of these problems, rather 
than a full exposition of the certification trail tech- 
niques used. 

The “Manhattan skyline” problem is: Given a set 
of rectangles with collinear bottom edges, compute the 
polygonal outline of the union of the rectangles [9]. 

The Voronoi diagram is a fundamental concept in 
computational geometry [11]. Given a set of points P 
in the plane, the Voronoi diagram is a partition of the 
plane into regions such that each region consists of all 
points closer to a given p € P than to any other other 
point in P. Computation of the Voronoi diagram is 
an important step in many problems involving point 
location. 

The next problem we consider is, given a sequence 
of integers, find the longest (not necessarily unique) 
strictly increasing subsequence. 


Size 

Basic 

Prim. Exec. 
(Also Gen. 
Trail) 

Sec. 

Exec. 

Sav. 

Speedup 

1000 

0.27 

0.26 

0.12 

29.63 

2.25 

5000 

1.69 

1.65 

0.57 

34.32 

2.96 

10000 1 

3.91 

3.72 

1.14 

37.85 

3.43 

15000 

6.08 

5.78 

1.77 

37,91 

3.44 

20000 

8.53 

8.27 

2.33 

37.87 

3.66 


Table 5: Skyline 


Size 

Basic 

Prim. Exec. 
(Also Gen. 
Trail) 

Sec. 

Exec. 

% 1 
Sav. 

Speedup 

100 

0.04 

0.04 

0.03 

12.50 

1.33 

500 

0.24 

0.26 

0.19 

6.25 

1.26 

1000 

0.51 

0.51 

0.39 

11.76 

1.31 

5000 

2.75 

2.82 

2.03 

11.82 

1.35 

10000 

5.79 

5.89 

4.06 

14.08 

1.43 

50000 

40.15 

40.63 

22.00 

22.00 

1.83 


Table 6: Voronoi Diagram 


Size 

Basic 

Prim. Exec. 
(Also Geu. 
Trail) 

Sec. 

Exec. 

~ 5T - 

Sav. 

Speedup 

10000 

0.13 

0.14 

0.04 

30.77 

3.25 

50000 

0.78 

0.81 

0.22 

33.97 

3.55 

100000 H 

1.61 

1.70 

0.44 

33.54 

3.66 

500000 

9.17 

9.32 

2.22 

37.08 1 

4.13 

1000000 

18.66 

19.58 

4.46 

35.58 

4.18 


Table 7: Longest Increasing Subsequence 


Given a set of points P in the plane, the Closest 
Pair problem is that of finding the pair of points with 
minimum distance over all pairs in the set. 


Size 

Basic 

Prim. Exec. 
(Also Gen. 
Trail) 

Sec. 

Exec. 

Sav. 

Speedup 

10000 

0.26 

0.27 

0.07 

34.62 

3.71 

50000 

1.45 

1.55 

0.36 

34.14 

4.03 

100000 

3.06 

3.26 

0.72 

34.97 

4.25 

500000 

16.84 

18.02 

3.62 

35.75 

4.65 


Table 8: Closest Pair 


Given a set of line segments in the plane, the line 
intersection problem is the problem of determining all 
intersections of line segments in this set. 

For the first four problems, algorithms running in 
0(n log(n)) time were implemented for the first execu- 
tion. The second execution, using certification trails, 
runs in linear time. The first execution algorithm used 
for line intersection runs in ( 0((k + n)log(n)) time 
where k is the number of intersections and n the num- 
ber of points. The second execution runs in 0(k + n) 
time. Note that k may be quadratic in n. 
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' Site 

Basic 

Prim. Exec. 
(Also Geu. 
Trail) 

Sec. 

Exec. 

Sav. 

Speedup 

~Tooo~ 

0.47 

0.49 

0.04 

43.62 

11.75 ^ 

“ 2500 

1.45 

1.53 

0.12 

43.10 

12.08 

ToooT 

3.33 

3.47 

'0.26 

^43.99 

12.81 

TooocT 

U 7.72 

7.83 

0.60 

45.08 

12.87 

"3 5000 

24.00 

24.12 

1.75 

46.10 

13.71 


Table 9: Line Segment Intersection 


* 5 Concluding Discussion 

Certification trails have heretofore been discussed 
^principally from a theoretical perspective. In this pa- 
mper we have presented experimental timing data which 
illustrates the advantages of the certification trail tech- 
I njque for software testing over the 2-version program- 
yming technique. We have further presented techniques 
■"and analytical results for several new algorithms which 
further support the significance of the certification trail 
r .technique by demonstrating its broadening applicabil- 
ity. It should be appreciated that the scope of our 
' experimental investigation is not limited to the algo- 
rithms considered here; numerous other algorithms we 
IJave considered could have been discussed, and we con- 
tinue to work on new applications. It should also be 
^nointed out that in addition to the timing experiments 
- eported here, software fault injection experiments have 
-Hlso been conducted which verify the detection capabil- 
ities of the certification trail method. The breadth of 
applicability of the certification trail technique contin- 
ues to expand along with the credibility of its advan- 
tages. Increasingly, the certification trail method can 
k e viewed as a competitive software testing alternative. 
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